This project aims to study how economic performance influences public approval of national governments over time. The dataset has 9 variables covering 111 countries between 1990-2023.
Table 1: A sample of the dataset (filtered by USA). Each column represents a variable, each row represents an observation, and each cell represents a value to its corresponding variable
country_name
country_code
year
approval_smoothed
approval_growth
gdp_pc
gdp_pc_growth
unemployment_rate
cpi_growth
United States
USA
1990
53.51480
4.7219676
44378.52
NA
5.62
5.40
United States
USA
1991
56.38072
5.3553726
43742.03
-1.434248
6.82
4.23
United States
USA
1992
45.20927
-19.8143060
44659.15
2.096669
7.51
3.03
United States
USA
1993
44.96474
-0.5408824
45286.94
1.405723
6.90
2.95
United States
USA
1994
44.03471
-2.0683562
46537.36
2.761109
6.12
2.61
United States
USA
1995
43.86478
-0.3859001
47220.96
1.468929
5.65
2.81
1.1 Country Coverage
Figure 1: Map showing the dataset’s geographical coverage. Countries covered by the dataset are shaded blue and countries with no observations are coloured white
Figure 1 shows the dataset covers a wide range of countries, enhancing the external validity of the analysis.
2 Data Quality Assessment
Variables will be individually examined in concise paragraphs due to the scale of the dataset and constraints of word count.
2.1 gdp_pc
GDP per capita in 2021 PPP USD ($) from The World Bank (2025) is a covariate accounting for countries’ base income level, calculated by: with currency converted to 2021 US purchasing power parity rates.
Table 2: Tables for gdp_pc. Subsequent tables under this section will share the same format and comments will be kept brief
(a) Data diagnosis table. Each column represents a diagnostic metric for the variable and outliers are classified based on the 1.5 IQR rule
Variable
Type
Unique Values
Total Values
Missing Values
Missing Proportion
Outliers Count
Zero Count
gdp_pc
double
3730
3730
44
1.17
85
0
(b) Summary statistics table. Each column represents a summary statistic for the variable
Variable
Mean
Standard Deviation
Minimum
5th Percentile
25th Percentile
Median
75th Percentile
95th Percentile
Maximum
gdp_pc
23035.64
22353.66
534.81
1653.59
5936.38
14954.72
35821.23
62638.93
137821.4
(a) KDE plot of gdp_pc (Gaussian kenerl and a bandwidth of 2000 is used). The x-axis indicates values of the variable and the y-axis represents the proportion of data at each value, such that the area under the curve equals 1. A rug plot is provided at the bottom with each tick mark representing a value of the variable
(b) Quantile plot for gdp_pc. The x-axis indicates quantiles of a normal distribution and the y-axis represents quantiles of the variable. Points aligning with the red reference line suggest the variable follows a normal distribution. Red data points highlight the minimum/maximum values, labeled with their corresponding ISO3 country code, year, and value
Figure 2: Kernel density estimation (KDE) and quantile plots for gdp_pc. Subsequent plots under this section will share the same format and comments will be kept brief
Figure 2 suggests the distribution of gdp_pc is right-skewed with considerable variation and no notable gaps in data. Table 2 (b) supports this skewness, as the mean ($23,035.64) exceeds the median ($14,954.72), reflecting the historically substantial income disparities between countries.
2.2 gdp_pc_growth
% change of gdp_pc from the previous year derived using R, the independent variable measuring economic performance of countries over time.
Table 3: Tables for gdp_pc_growth
(a) Data diagnosis table
Variable
Type
Unique Values
Total Values
Missing Values
Missing Proportion
Outliers Count
Zero Count
gdp_pc_growth
double
3619
3619
155
4.11
281
0
(b) Summary statistics table
Variable
Mean
Standard Deviation
Minimum
5th Percentile
25th Percentile
Median
75th Percentile
95th Percentile
Maximum
gdp_pc_growth
1.96
5.53
-64.42
-6.15
0.28
2.19
4.34
8.49
90.83
(a) KDE plot for gdp_pc_growth. Bandwidth is set to 0.5
(b) Quantile plot for gdp_pc_growth
Figure 3: KDE and quantile plots for gdp_pc_growth
Figure 3 suggests gdp_pc_growth has a roughly symmetric (slight left-skew) distribution with no notable gaps in data, numerous outliers (281, Table 3 (a)), and considerable variation that reflect positive and negative economic shocks useful to the analysis.
2.3 approval_smoothed
Approval of national government (% of survey respondents) smoothed via exponential smoothing from Carlin et al. (2023) is the dependent variable measuring public approval. approval_smoothed is measured by collecting survey data from numerous sources asking respondents from countries with competitive elections whether they approve their executives.
Table 4: Tables for approval_smoothed
(a) Data diagnosis table
Variable
Type
Unique Values
Total Values
Missing Values
Missing Proportion
Outliers Count
Zero Count
approval_smoothed
double
2389
2412
1362
36.09
22
0
(b) Summary statistics table
Variable
Mean
Standard Deviation
Minimum
5th Percentile
25th Percentile
Median
75th Percentile
95th Percentile
Maximum
approval_smoothed
47.32
14.54
3.94
25.59
36.96
46.47
56.51
73.34
95.65
(a) KDE plot for approval_smoothed. Bandwidth is set to 2.5
(b) Quantile plot for approval_smoothed
Figure 4: KDE and quantile plots approval_smoothed
Figure 4 suggests the distribution of approval_smoothed is roughly symmetric (slight right-skew) with no notable gaps in data. However, Table 4 (a) highlights rather extreme outliers and large missing data proportions (36.08%); this will be addressed later. Nonetheless, the remaining 1050 values are sufficiently large and representative for the analysis.
2.4 approval_growth
% change of approval_smoothed from the previous year derived using R, a covariate accounting for approval_smoothed’s relative changes that extends comparisons with gdp_pc_growth.
Table 5: Tables for approval_growth
(a) Data diagnosis table
Variable
Type
Unique Values
Total Values
Missing Values
Missing Proportion
Outliers Count
Zero Count
approval_growth
double
2314
2335
1439
38.13
192
22
(b) Summary statistics table
Variable
Mean
Standard Deviation
Minimum
5th Percentile
25th Percentile
Median
75th Percentile
95th Percentile
Maximum
approval_growth
2.03
24.58
-71.63
-23.47
-8.43
-0.51
7.47
34.02
455.31
(a) KDE plots for approval_growth. Bandwidth is set to 2
(b) Quantile plot for approval_growth
Figure 5: KDE and quantile plots for approval_growth
Figure 5 suggests the approval_growth distribution is right-skewed with no notable gaps in data and many extreme outliers requiring further examination. Also, Table 5 (a) reports 22 zero values, likely attributed to exponential smoothing of approval_smoothed instead of genuine stability in approval.
2.5 cpi_growth
% change of consumer price index (CPI) from the previous year from International Monetary Fund (2025), a covariate accounting for inflation. CPI measures the price level of a basket of goods and services typical households consume. The basket and its price are determined by household surveys and supplier data respectively.
Table 6: Tables for cpi_growth
(a) Data diagnosis table
Variable
Type
Unique Values
Total Values
Missing Values
Missing Proportion
Outliers Count
Zero Count
cpi_growth
double
1729
3520
254
6.73
361
1
(b) Summary statistics table
Variable
Mean
Standard Deviation
Minimum
5th Percentile
25th Percentile
Median
75th Percentile
95th Percentile
Maximum
cpi_growth
31.3
452.84
-16.12
-0.2
1.96
4.13
8.88
36.82
23773.13
(a) KDE plot for cpi_growth. Default bandwidth by R is applied.
(b) Quantile plot for cpi_growth. Minimum value label is hidden due to data point obstruction
Figure 6: KDE and quantile plots for cpi_growth
Figure 6 (b) indicates an extreme outlier (Congo during its civil war in 1994) flattening the distribution. Logarithmic transformation is required.
(a) KDE plot for cpi_growth_log10. Bandwidth is set to 0.05
(b) Quantile plot for cpi_growth_log10
Figure 7: KDE and quantile plots for of cpi_growth (cpi_growth_log10). cpi_growth was added by a constant of to shift the minimum value () to , ensuring only positive values are used for log transformation
Following the transformation in Figure 7, the distribution is right-skewed with no notable gaps in data. Figure 7 (b) also reveals many outliers with greater concentration in the right tail, reflecting the historical tendencies for inflationary crises.
2.6 unemployment_rate
Unemployment rate (% of labour force) from International Monetary Fund (2025), a covariate accounting for unemployment. The variable is measured using labour force surveys, where respondents without work but available and actively seeking work are considered unemployed.
Table 7: Tables for unemployment_rate
(a) Data diagnosis table
Variable
Type
Unique Values
Total Values
Missing Values
Missing Proportion
Outliers Count
Zero Count
unemployment_rate
double
1388
2755
1019
27
156
0
(b) Summary statistics table
Variable
Mean
Standard Deviation
Minimum
5th Percentile
25th Percentile
Median
75th Percentile
95th Percentile
Maximum
unemployment_rate
8.29
5.77
0.04
2.08
4.39
6.84
10.31
19.71
38.8
(a) KDE plot for unemployment_rate. Bandwidth is set to 0.7
(b) Quantile plot for unemployment_rate.Minimum value label is hidden due to data point obstruction
Figure 8: KDE and quantile plots for unemployment_rate
Figure 8 suggests the unemployment_rate distribution is right-skewed with no notable gaps in data, many outliers and a heavier right tail, reflecting numerous historically high unemployment crises. Table 7 (a) reports 27% missing proportions, which consistently fluctuates over time as highlighted by Figure 9, likely due to irregular survey frequencies.
2.7 Visualizing Missingness
Figure 9: Proportion of missing data for each variable (y-axis) over time (x-axis). Dervied variables are omitted because they share the same missingness patterns as their source variable
Figure 9 suggests approval_smoothed has the most missing data over time, which will be further explored in Figure 10. Missingness for other variables is negligible.
Figure 10: Heatmap of missing approval_smoothed values by country (y-axis) and year (x-axis). Blue squares indicate missing values and white squares indicate observed values. Countries are grouped by quartiles of their mean gdp_pc between 1990-2023
Figure 10 suggests missing approval_smoothed data correlates with countries’ gdp_pc, with higher-income countries generally exhibiting less missing data overall—likely due to greater resource capacity for data collection. However, missing data still declined for all countries over time, suggesting confounders like technological advancements likely reduced the cost of survey administration for all countries over time. Other factors, like warfare, can also affect data availability. Recent missing data may reflect delayed data reports as they vary by survey source.
3 Independent and Dependent Variable Relationship Assessment
3.1 Theoretical Conjecture
Denoting Variables:
Independent variable (): % change of GDP per capita in 2021 PPP USD from the previous year (gdp_pc_growth).
Dependent variable (): Approval of national government smoothed via exponential smoothing (approval_smoothed).
Relationship Rationale:
is often assumed to be positively correlated with because economic performance is a core responsibility of the government that impacts the living standards, happiness, and even survival of the individuals who dictate public approval of their government. Furthermore, individuals often associate their financial hardships with poor governance regardless of whether the government is directly responsible.
Necessary Considerations:
Non-linearity: The relationship between and may not be strictly linear because it can reach a threshold—similar to a concave utility function—where further economic development produces diminishing returns of approval as individuals substitute to other concerns responsible by the government. Potential solutions to address this include
Non-linear regression (e.g. Lowess): Visualizes non-linear patterns on scatter plots.
Endogeneity: Occurs when the relationship between and is confounded by unexplained variables influencing . Omitting these variables from the analysis can result in misleading conclusions about the bivariate relationship. To address this omitted variable bias, I included covariates in the dataset to control for alternative explanations.
Confounds: Changes in may not be solely due to , confounders such as cpi_growth or unemployment_rate may also influence or both and .
Consider gdp_pc (more confounds explored in stage 6): The correlation between and may be weaker for high-gdp_pc countries because economic performance matters to individuals less compared to those in low-gdp_pc countries. To address this, I will
Group by countries then calculate for the mean gdp_pc,
Stratify countries into intervals of mean gdp_pc (e.g. quartiles),
Run regressions for and in each stratum and analyze changes in correlation using necessary metrics and visualizations.
3.2 Graphical Examination
This section will thoroughly examine the underlying distributions between and using visualizations.
Figure 11: KDE plot of gdp_pc_growth and approval_smoothed z-scores (ensures a comparable scale) using a bandwidth of 0.15. Rug plots color-coded and separated by variable are added at the bottom, with tick marks representing values
Figure 11 indicates both variables share similar distribution shapes, i.e., unimodal and roughly symmetric. While appears more spread out overall and is more concentrated at its centre, the rug plots reveal exhibiting a wider range with more extreme outliers on both ends.
(a) Q-Q plot for gdp_pc_growth and approval_smoothed with default axes limits
(b) Q-Q plot for gdp_pc_growth and approval_smoothed with axes limits set to (-6, 6) for clearer visual assessment of the distribution
Figure 12: Quantile-Quantile (Q-Q) plots comparing quantiles of gdp_pc_growth (x-axis) and approval_smoothed (y-axis) z-scores. Standardization is used to ensure comparability and easier interpretation of the 45-degree line
Figure 12 reveals that while the centres of both distributions are similar (roughly following the red reference line), there are substantial deviations in the tails. Specifically, an S-shaped pattern is observed with noticeable “plateaus” at the extremes ( quantiles increasing slower than quantiles), suggesting has a heavier tail and extreme cases occur more consistently. However, quantiles span a wider range, indicating higher extremes. The consistency of extreme values may suggest measurement validity issues.
Since relies on survey data, concerns about measurement are reasonable. Specifically, the consistency of extreme cases can stem from non-random selection of survey respondents. While all survey sources used by Carlin et al. (2023) are reputable (e.g. Gallup World Poll, Eurobarometer), they utilize voluntary surveys that naturally invite biases. Some biases include voluntary response bias, where individuals with stronger opinions respond more often (leading to extreme cases), and non-response bias, where individuals not responding—often with different beliefs—lack representation in the sample.
Despite these prevalent biases, Carlin et al. (2023) draw from a wide range of survey sources, offering a larger sample and reducing sampling variability, which helps to mitigate these biases. Winsorization could be a solution to further reduce the effects of biases from and extreme outliers from by replacing extreme values with percentile values (e.g. 5th and 95th percentile) to constrain the data within a more realistic range. Figure 13 shows how Figure 12 is altered after a 90% winsorization on and ,
Figure 13: Q-Q plot of gdp_pc_growth and approval_smoothed z-scores after 90% winsorization. Axes limits are set to (-2.5, 2.5) with no data points beyond the limit
effectively “trimming” the tails that initially produced the S pattern and emphasizing the middle quantiles following the reference line instead. Although winsorization can reduce the impact of outliers, it suffers from a trade-off of omitting potentially meaningful outliers for simplicity.
Figure 14: Line plot for z-scores of mean gdp_pc_growth and approval_smoothed for all countries across time. Standardization ensures variables are on comparable scales and fluctuations in values can be easily observed
Figure 14 indicates that and fluctuate over time, with extreme shifts accurately reflecting major global events (e.g. COVID). appears to follow the trend of but with a lag of ~3-4 years. The observed lag may arise from irregular survey administration frequencies creating temporal gaps between shocks to and recording of , distorting the immediate effects between the variables. Delays in economic shocks impacting public perception may also contribute to the lag: while can easily quantify immediate economic changes, operates in different scales as respondents vary in their tolerance and optimism for the government, often leading to a delayed change in their attitude as they observe how their government handles economic shocks.
The lagged relationship will be explored in stage 6 by accounting for lagged values by intervals of years () when running regression models (e.g. distributed lag and autoregressive models).
Figure 15: Line plots for z-scores of mean gdp_pc_growth and approval_smoothed for all countries across time. Plots are separated by lagged variants of gdp_pc_growth
Figure 15 indicates , , , and all better overlap with than , indicating they may better explain and should be explored further as independent variables when running regressions.
4 Conclusion
While current evidence suggests a weak but evident correlation between and , accounting for non-linearity and controlling for confounds may reveal stronger relationships. However, it is vital to first address underlying issues with —such as lag and non-random sampling biases—to ensure the analysis avoids the perils of “garbage in, garbage out”.
References
Carlin, R. E., Hartlyn, J., Hellwig, T., Love, G. J., Martı́nez-Gallardo, C., Singer, M. M., … Sert, H. (2023). Executive ApprovalDatabase 3.0. Retrieved from https://executiveapproval.org/